Skip to content

Vmspawn stuff#11

Open
daandemeyer wants to merge 14 commits intomainfrom
vmspawn-stuff
Open

Vmspawn stuff#11
daandemeyer wants to merge 14 commits intomainfrom
vmspawn-stuff

Conversation

@daandemeyer
Copy link
Copy Markdown
Owner

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

Claude review of PR #11 (d330410)

Must fix

  • config_parse_many_full NULL crashsrc/journal-remote/journal-remote-main.c:839 — Passing NULL for conf_file_dirs and dropin_dirname will trigger assertion failures and abort the process
  • nspawn man page missing --runtime-scope=src/nspawn/nspawn.c:524 — Option is in --help and parse_argv but man/systemd-nspawn.xml has no <varlistentry>
  • ptyfwd in_buffer_full not updated on error pathssrc/shared/ptyfwd.c:685 — Hotkey handler error return skips in_buffer_full increment, losing buffered data (regression from refactor)
  • CXL silent ignore on unsupported architecturesrc/vmspawn/vmspawn.c:2343--cxl=yes silently does nothing on non-x86; should error like --tpm=yes does
  • arg_ram_slots not reset in two-part --ram= formsrc/vmspawn/vmspawn.c:541--ram=SIZE:MAXSIZE doesn't reset arg_ram_slots, leaving stale values from prior invocation
  • verify_arguments() overrides .nspawn file settingssrc/nspawn/nspawn.c:1666 — When SETTING_USERNS/SETTING_NETWORK not in mask, verify_arguments() unconditionally overrides arg_userns_mode and arg_private_network based on scope, silently clobbering values loaded from .nspawn settings files

Suggestions

  • RegisterMachineEx D-Bus fallback lostsrc/shared/machine-register.c:44 — New shared code only tries RegisterMachineWithNetwork, losing PIDFD-based RegisterMachineEx fallback nspawn had
  • register_machine no D-Bus fallback on runtime_directory_generic failuresrc/shared/machine-register.c:86 — Returns error without trying D-Bus path, unlike unregister_machine which falls back
  • fw_cfg key-too-long fallback missing kernel null checksrc/vmspawn/vmspawn.c:3333 — Falls through to kernel cmdline without checking kernel != NULL; credential silently lost on aarch64 image boot
  • sdl,gl=on fails on headless/Waylandsrc/vmspawn/vmspawn.c:2582 — Hardcoded -display sdl,gl=on fails without SDL/GL; consider gl=auto
  • TERM value not sanitized for kernel cmdlinesrc/vmspawn/vmspawn.c:3119$TERM placed on kernel cmdline without space/special-char validation
  • virtio-vga-gl should be used with gl=onsrc/vmspawn/vmspawn.c:2587 — Plain virtio-vga doesn't participate in GL acceleration; virtio-vga-gl is the correct variant
  • free_and_replace uses plain free for credential datasrc/vmspawn/vmspawn.c:3184 — Old credential data freed with free() instead of erase_and_free() used by credential framework
  • -U ordering with --runtime-scope=src/nspawn/nspawn.c:1254-U reads arg_runtime_scope which may not be set yet depending on argument order
  • --ephemeral silently ignored with scsi-cdsrc/vmspawn/vmspawn.c:2825--ephemeral and --discard-disk flags silently dropped for CD-ROM drives
  • --system/--user aliases not in vmspawn man pageman/systemd-vmspawn.xml:73 — Code still accepts --system/--user but man page only documents --runtime-scope=
  • machine_credential_add silent failuresrc/nspawn/nspawn.c:6743 — Error path jumps to finish without logging
  • credential file permissions inconsistencysrc/vmspawn/vmspawn.c:3402 — SMBIOS credentials 0600, fw_cfg credentials 0400; should be consistent
  • help text --extra-drive missing scsi-cdsrc/vmspawn/vmspawn.c:227--extra-drive help text and man page don't list scsi-cd
  • CXL ARCHITECTURE_SUPPORTS_CXL includes 32-bitsrc/vmspawn/vmspawn-util.h:42 — 32-bit x86/ARM included but QEMU CXL only works on 64-bit (x86_64, aarch64)
  • scsi-cd "ro" append relies on kernel last-winssrc/vmspawn/vmspawn.c:3443 — Appending "ro" to counteract "rw" is fragile; only works with direct kernel boot, not SMBIOS credential path
  • varlink error log in speculative unregister callsrc/shared/machine-register.c:222varlink_callbo_and_log produces error-level log for a speculative call before D-Bus fallback
  • --grow-image warning missing for scsi-cdsrc/vmspawn/vmspawn.c:3809 — Warnings added for --ephemeral/--discard-disk but not --grow-image with scsi-cd

Nits

  • Unprivileged operation message wordingsrc/nspawn/nspawn.c:1671 — Says "Unprivileged" but now triggered by --runtime-scope=user which root can use
  • include ordering machine-register.hsrc/nspawn/nspawn.c:84 — Out of alphabetical order among includes
  • strv_prepend redundant copysrc/vmspawn/vmspawn.c:3076strv_prepend after strjoin makes redundant copy; use strv_consume_prepend
  • Continuation line indentsrc/vmspawn/vmspawn.c:219 — 28-space indent vs 27-space used elsewhere in help text
  • else without braces for multi-line bodysrc/vmspawn/vmspawn.c:3347 — systemd style requires braces when body spans multiple lines

Workflow run

Comment on lines +836 to +847

const char* const files[] = { config_file, NULL };

return config_parse_many_full(
files,
/* conf_file_dirs= */ NULL,
/* dropin_dirname= */ NULL,
/* root= */ NULL,
/* root_fd= */ XAT_FDROOT,
"Remote\0",
config_item_table_lookup, items,
CONFIG_PARSE_WARN,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: Claude: config_parse_many_full() asserts that both conf_file_dirs and dropin_dirname are non-NULL (see src/shared/conf-parser.c where assert(conf_file_dirs) and assert(dropin_dirname) are checked). Passing NULL for both parameters will crash systemd-journal-remote whenever $SYSTEMD_JOURNAL_REMOTE_CONFIG_FILE is set to a path other than /dev/null — which is the normal case when --forward-journal-config=PATH is provided to vmspawn or nspawn.

Consider using config_parse() for the single-file case, or pass valid empty/sentinel values (e.g., STRV_MAKE_EMPTY for conf_file_dirs and "" for dropin_dirname) so the assertions are satisfied while still skipping drop-in processing.

Comment thread src/nspawn/nspawn.c
" Load credential to pass to container from file or\n"
" AF_UNIX stream socket.\n"
"\n%3$sOther:%4$s\n"
" --runtime-scope=system|user\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: Claude: The --runtime-scope= option is implemented in parse_argv and shown in --help, but man/systemd-nspawn.xml is not updated with a corresponding <varlistentry>. A man page entry should be added similar to the one in man/systemd-vmspawn.xml.

Comment thread man/systemd-vmspawn.xml
<varlistentry>
<term><option>--system</option></term>
<term><option>--user</option></term>
<term><option>--runtime-scope=</option></term>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: The man page now only documents --runtime-scope=, but the code still accepts --system and --user as aliases (via ARG_SYSTEM/ARG_USER in parse_argv). Consider keeping --system/--user as additional <term> entries alongside --runtime-scope= to avoid a documentation regression for existing users.

Comment thread src/vmspawn/vmspawn.c
case CONSOLE_GUI:
/* -vga is a convenience option, keep on cmdline */
r = strv_extend_many(&cmdline, "-vga", "virtio");
/* -display has no config file equivalent */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: Hardcoding -display sdl,gl=on assumes QEMU was built with SDL support and that a working OpenGL context is available on the host. This will fail on headless systems, remote SSH sessions, or Wayland-only hosts without XWayland. Consider using gl=auto so QEMU can fall back to software rendering when GPU acceleration is unavailable.

Comment thread src/vmspawn/vmspawn.c Outdated
* XTGETTCAP, not all terminal emulators implement this, so let's always propagate
* $TERM if we have it. */
const char *term = getenv("TERM");
if (!isempty(term) && !streq(term, "unknown") /* some CI environments set TERM=unknown */) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: The host $TERM value is placed directly onto the kernel command line via strjoin. The kernel command line is space-delimited, so if $TERM contained whitespace or special characters it could inject extra kernel parameters. While standard terminfo values are safe, consider adding a basic content check (e.g., reject values containing spaces or =) for defense in depth.

Comment thread src/nspawn/nspawn.c Outdated
#include "nspawn-mount.h"
#include "nspawn-network.h"
#include "nspawn-oci.h"
#include "machine-register.h"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Claude: machine-register.h is placed between nspawn-oci.h and nspawn-register.h, but alphabetically it belongs earlier (between machine-credential.h and main-func.h). Same issue with machine-register.c in src/shared/meson.build:123 (placed before lsm-util.c instead of after it) and machine-register.h in src/vmspawn/vmspawn.c:89.

Comment thread src/vmspawn/vmspawn.c
Comment on lines +541 to +544
} else {
r = parse_size(e + 1, 1024, &arg_ram_max);
if (r < 0)
return log_error_errno(r, "Failed to parse --ram=%s: %m", optarg);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Claude: When --ram=BYTES:MAXBYTES is specified (two-part form without slots), arg_ram_slots is not reset to 0. The no-colon branch below (line 552) correctly resets both arg_ram_max and arg_ram_slots to 0, but this else block only sets arg_ram_max. If --ram were specified multiple times (e.g., --ram=2G:8G:4 then --ram=2G:8G), the stale arg_ram_slots value of 4 would persist.

@daandemeyer
Copy link
Copy Markdown
Owner Author

@claude review

Comment thread src/vmspawn/vmspawn.c Outdated
Comment on lines +2343 to +2347
if (ARCHITECTURE_SUPPORTS_CXL && arg_cxl > 0) {
r = qemu_config_key(config_file, "cxl", "on");
if (r < 0)
return r;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: When --cxl=true is explicitly requested on an unsupported architecture, this silently does nothing. The TPM option (around line 3147) correctly returns an error in this case:

if (\!ARCHITECTURE_SUPPORTS_TPM) {
        if (arg_tpm > 0)
                return log_error_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), ...);
}

CXL should follow the same pattern: error out if arg_cxl > 0 on an unsupported architecture, and silently disable with a debug log if arg_cxl < 0 (auto).

Comment thread src/nspawn/nspawn.c Outdated
Comment on lines +6729 to +6730
if (r < 0)
goto finish;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: machine_credential_add() does not log on failure. The generic r < 0 path here silently jumps to finish without any error message. Consider using machine_credential_add_and_log() (which exists in machine-credential.c) or add explicit log_error_errno/log_oom calls. Note that the -EEXIST case above already has a custom message, so handle it beforehand if switching to the _and_log variant.

Comment thread src/vmspawn/vmspawn.c Outdated
Comment on lines +2372 to +2381
_cleanup_free_ char *slots = NULL;

r = qemu_config_key(config_file, "maxmem", mem_max);
if (r < 0)
return r;

if (asprintf(&slots, "%u", arg_ram_slots > 0 ? arg_ram_slots : 1u) < 0)
return log_oom();

r = qemu_config_key(config_file, "slots", slots);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Consider using qemu_config_keyf (already used in this file at lines 2420, 2596, 2600) to avoid the manual asprintf + temporary variable:

r = qemu_config_keyf(config_file, "slots", "%u", arg_ram_slots > 0 ? arg_ram_slots : 1u);

@daandemeyer
Copy link
Copy Markdown
Owner Author

@claude review

Comment thread src/vmspawn/vmspawn.c
return log_oom();

r = qemu_config_section(config_file, "device", "vga0",
"driver", "virtio-vga");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: With gl=on in the display options (line 2582), the VGA device driver should be virtio-vga-gl rather than virtio-vga. The virtio-vga-gl variant (available since QEMU 6.1) supports virgl 3D acceleration; plain virtio-vga does not participate in GL passthrough, so the gl=on display flag won't provide GPU-accelerated rendering inside the guest.

Suggested change
"driver", "virtio-vga");
"driver", "virtio-vga-gl");

Comment thread src/nspawn/nspawn.c Outdated
if (userns_supported()) {
/* Note that arg_userns_ownership is implied by USER_NAMESPACE_PICK further down. */
arg_userns_mode = arg_privileged ? USER_NAMESPACE_PICK : USER_NAMESPACE_MANAGED;
arg_userns_mode = arg_runtime_scope == RUNTIME_SCOPE_SYSTEM ? USER_NAMESPACE_PICK : USER_NAMESPACE_MANAGED;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: The -U handler reads arg_runtime_scope to decide between USER_NAMESPACE_PICK and USER_NAMESPACE_MANAGED. Previously arg_privileged was stable (set once in initialize_defaults() and never modified during parsing). Now that --runtime-scope= can change the value during parsing, the behavior of -U depends on argument order: systemd-nspawn -U --runtime-scope=user behaves differently from systemd-nspawn --runtime-scope=user -U.

Consider deferring the -U resolution to after all arguments are parsed (e.g. in verify_arguments()), so the final value of arg_runtime_scope is always used regardless of option order.

Comment thread src/vmspawn/vmspawn.c Outdated
if (r < 0)
return log_oom();

free_and_replace(existing->data, combined);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: free_and_replace() uses plain free() on the old existing->data, but machine_credential_done() uses erase_and_free() for credential data to securely wipe it before releasing memory. For consistency with the credential cleanup infrastructure, consider using erase_and_free explicitly:

erase_and_free(existing->data);
existing->data = TAKE_PTR(combined);

Comment thread src/vmspawn/vmspawn.c Outdated
Comment on lines +1640 to +1641
log_error_errno(errno, "Failed to set $SYSTEMD_JOURNAL_REMOTE_CONFIG_FILE: %m");
_exit(EXIT_FAILURE);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: This code runs in the child process of fork_notify(). The analogous setenv failure in fork_notify.c itself uses log_debug_errno and _exit(EXIT_MEMORY), since setenv failure is almost always ENOMEM. For consistency:

log_debug_errno(errno, "Failed to set $SYSTEMD_JOURNAL_REMOTE_CONFIG_FILE: %%m");
_exit(EXIT_MEMORY);

Comment thread src/vmspawn/vmspawn.c
" --image-disk-type=TYPE\n"
" Specify disk type (virtio-blk, virtio-scsi, nvme; default: virtio-blk)\n"
" Specify disk type (virtio-blk, virtio-scsi, nvme,\n"
" scsi-cd; default: virtio-blk)\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: The --image-disk-type help text is updated to include scsi-cd, but the --extra-drive help text (line 281) still only lists virtio-blk, virtio-scsi, nvme. The man page systemd-vmspawn.xml --image-disk-type= description (line 134) also still lists only three types. Both should be updated to include scsi-cd.

Comment thread src/vmspawn/vmspawn.c Outdated
Comment on lines +3121 to +3125
_cleanup_free_ char *p = strjoin(tty_key, "=", term);
if (!p)
return log_oom();

r = strv_prepend(&arg_kernel_cmdline_extra, p);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Claude: nit: Since strjoin() already returns a freshly allocated string, strv_prepend() makes a redundant copy via its internal strdup(). Use strv_consume_prepend() with TAKE_PTR() to transfer ownership directly:

r = strv_consume_prepend(&arg_kernel_cmdline_extra, TAKE_PTR(p));

Comment thread src/vmspawn/vmspawn.c
" --user Interact with user manager\n"
" --system Interact with system manager\n"
" --runtime-scope=system|user\n"
" Run in system or user service manager scope\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Claude: nit: The continuation line uses 28-space indent but the rest of vmspawn's help text uses 27-space indent for continuations (see e.g. lines 229, 233, 241).

@daandemeyer
Copy link
Copy Markdown
Owner Author

@claude review

@daandemeyer
Copy link
Copy Markdown
Owner Author

@claude review

Comment thread src/shared/ptyfwd.c
* immediately. */
RequestOperation q = look_for_escape(f, f->in_buffer + f->in_buffer_full, k);
if (q < 0)
return q;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: Claude: must-fix: When look_for_escape() returns an error (q < 0), this returns without incrementing f->in_buffer_full. The original code incremented the buffer immediately after calling look_for_escape(), ensuring the bytes read from stdin were always consumed regardless of the return value.

Without this increment, the read bytes remain unconsumed and could be re-processed or lost on the next iteration.

The same issue exists at line 685 when the hotkey handler returns an error — f->in_buffer_full is not incremented before returning.

Consider moving f->in_buffer_full += (size_t) k; back to immediately after the look_for_escape() call (before checking q), matching the original behavior:

RequestOperation q = look_for_escape(f, f->in_buffer + f->in_buffer_full, k);
f->in_buffer_full += (size_t) k;
if (q < 0)
        return q;

This would also fix the hotkey handler error path and eliminate the redundant increment in the REQUEST_EXIT branch.

Comment thread src/shared/machine-register.c Outdated
_cleanup_free_ char *p = NULL;
r = runtime_directory_generic(scope, "systemd/machine/io.systemd.Machine", &p);
if (r < 0)
return r;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: If runtime_directory_generic() fails, this returns immediately without attempting D-Bus fallback, even though bus may be available. This is inconsistent with unregister_machine() (line 128-130) which gracefully falls through to D-Bus on the same failure.

Consider falling back to D-Bus when available:

r = runtime_directory_generic(scope, "systemd/machine/io.systemd.Machine", &p);
if (r < 0) {
        log_debug_errno(r, "Failed to determine runtime directory for varlink, falling back to D-Bus: %%m");

        if (\!bus)
                return r;

        return register_machine_dbus(bus, machine_name, uuid, service, class, pidref, directory, local_ifindex);
}

Comment thread src/vmspawn/vmspawn.c
/* Fall through to kernel command line if key is too long */
log_debug("fw_cfg key '%s' exceeds 55 character limit, falling back to kernel command line.", key);

if (strv_extendf(&arg_kernel_cmdline_extra,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: When the fw_cfg key exceeds 55 characters, this falls through to appending to arg_kernel_cmdline_extra without checking that kernel is non-NULL. The else if (kernel) branch at line 3337 correctly guards against this, but this fallback path does not.

On aarch64 (where ARCHITECTURE_SUPPORTS_FW_CFG is true but ARCHITECTURE_SUPPORTS_SMBIOS is false), if booting from an image without an explicit kernel and a credential ID is long enough to exceed the 55-char fw_cfg key limit, the credential data will be silently added to arg_kernel_cmdline_extra but never delivered to the VM (since there's no kernel to pass it on the command line). Consider adding a kernel check here, or falling through to the log_warning at line 3342 when kernel is NULL.

Comment thread src/vmspawn/vmspawn.c
"format", image_format_to_string(arg_image_format),
"discard", on_off(arg_discard_disk),
"snapshot", on_off(arg_ephemeral));
if (arg_image_disk_type == DISK_TYPE_SCSI_CD)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: When --image-disk-type=scsi-cd is used, the snapshot and discard drive options are silently dropped (since CD-ROMs are read-only). If the user also passed --ephemeral or --discard-disk, those flags are silently ignored with no warning. Consider emitting a log_warning (or returning an error) when --ephemeral or --discard-disk is combined with scsi-cd, so the user knows their flags have no effect.

Comment thread src/nspawn/nspawn.c
SET_FLAG(arg_mount_settings, MOUNT_APPLY_APIVFS_NETNS, arg_private_network);

if (!arg_privileged && arg_userns_mode != USER_NAMESPACE_MANAGED)
if (arg_runtime_scope != RUNTIME_SCOPE_SYSTEM && arg_userns_mode != USER_NAMESPACE_MANAGED)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Claude: nit: The error message says "Unprivileged operation requires managed user namespaces" but the condition was changed from !arg_privileged to arg_runtime_scope != RUNTIME_SCOPE_SYSTEM. With the new --runtime-scope= option, a root user could trigger this by passing --runtime-scope=user, making the "Unprivileged" wording misleading. Consider updating to "User-scoped operation requires managed user namespaces" to match the new semantics.

Pass -no-user-config while we're at it to avoid loading qemu config
from /etc which is more likely to cause hard to debug issues rather
than do something useful.
Improvements taken from mkosi. While we're at it, put more of the stuff
in config files.
If we're direct kernel booting on arm, smbios won't work. Add a
fwcfg and kernel command line fallback in case smbios isn't available.
QEMU's stdio chardev sets O_NONBLOCK on both stdin and stdout (see
chardev/char-stdio.c [1] and chardev/char-fd.c [2]). Since forked
processes share file descriptions, and on a terminal all three stdio
fds typically reference the same file description, this affects our
own stdio too.

Avoid this by using a PTY with chardev serial instead of chardev
stdio for native console mode, matching the approach already used
for interactive and read-only modes. The PTY forwarder shovels bytes
transparently between our stdio and QEMU's PTY using the new
PTY_FORWARD_DUMB_TERMINAL and PTY_FORWARD_TRANSPARENT flags, which
disable terminal decoration (background tinting, window title, OSC
context) and escape sequence handling (Ctrl-] exit, hotkeys)
respectively.

The chardev is configured with mux=on so the QEMU monitor remains
accessible via Ctrl-a c.

Also dedup CONSOLE_NATIVE, CONSOLE_READ_ONLY, and CONSOLE_INTERACTIVE
handling by using fallthrough, with the only differences being the
ptyfwd flags, mux setting, and monitor section.

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/chardev/char-stdio.c
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/chardev/char-fd.c

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add --cxl=BOOL option to enable CXL (Compute Express Link) support in
the virtual machine. When enabled, adds cxl=on to the QEMU machine
configuration. Supported on x86 and ARM architectures, matching mkosi's
CXL= setting.

Extend --ram= to accept an optional maximum size for memory hotplug,
using the syntax --ram=SIZE[:MAXSIZE] (e.g. --ram=2G:8G). When a
maximum is specified, the maxmem key is added to the QEMU memory
configuration section to enable memory hotplug up to the given limit.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add --forward-journal-config=PATH to specify a custom configuration
file for systemd-journal-remote when forwarding journal entries from
the VM. Defaults to /dev/null when not specified, causing
systemd-journal-remote to ignore its default configuration files.
This allows callers like mkosi to control journal-remote settings
(MaxUse, KeepFree, MaxFileSize, etc.) without interfering with the
host's journal-remote configuration.

Add $SYSTEMD_JOURNAL_REMOTE_CONFIG_FILE environment variable support
to systemd-journal-remote. When set, the specified file is read
instead of the standard configuration file paths and drop-in
directories. When set to /dev/null, configuration file parsing is
skipped entirely.

Make fork_notify() argv parameter optional. When NULL is passed,
fork_notify() returns 0 in the child (with $NOTIFY_SOCKET set) and
lets the caller run custom code before exec. This allows vmspawn to
set $SYSTEMD_JOURNAL_REMOTE_CONFIG_FILE in the child environment
without polluting the parent process.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Reduce virtiofsd log noise by setting --log-level=error, and drop
the unnecessary mknod capability with --modcaps=-mknod, matching
mkosi's virtiofsd invocation.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
…nel cmdline

Switch runtime virtiofs mount configuration from systemd.mount-extra=
kernel command line parameters to the fstab.extra credential. This
avoids consuming kernel command line space (which is limited) and
matches the approach used by mkosi.

Each mount is added as an fstab entry in the format:
  {tag} {destination} virtiofs {ro|rw},x-initrd.mount

If the user already specified a fstab.extra credential via
--set-credential= or --load-credential=, the virtiofs mount entries
are appended to it rather than conflicting.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
When running in a console mode (interactive, native, or read-only),
propagate the host's $TERM into the VM by adding TERM= and
systemd.tty.term.hvc0= to the kernel command line.

TERM= is picked up by PID 1 and inherited by services on /dev/console
(such as emergency.service). systemd.tty.term.hvc0= is used by services
directly attached to /dev/hvc0 (such as serial-getty@hvc0.service) which
look up $TERM via the systemd.tty.term.<tty> kernel command line
parameter.

While systemd can auto-detect the terminal type via DCS XTGETTCAP, not
all terminal emulators implement this, so explicitly propagating $TERM
provides a more reliable experience. We skip propagation when $TERM is
unset or set to "unknown" (as is the case in GitHub Actions and some
other CI environments).

Previously this was handled by mkosi synthesizing the corresponding
kernel command line parameters externally.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
…ime_scope

Add --runtime-scope=system|user option to both nspawn and vmspawn
for explicit runtime scope selection. In vmspawn, --system and
--user are kept as undocumented compatibility aliases. The default
is auto-detected from the effective UID.

Replace all uses of the arg_privileged boolean with
arg_runtime_scope comparisons throughout nspawn.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add --forward-journal=FILE|DIR to forward the container's journal
entries to the host via systemd-journal-remote. When specified,
nspawn starts systemd-journal-remote listening on a Unix socket,
bind-mounts it into the container at /run/host/journal/socket, and
passes a journal.forward_to_socket credential pointing to it.

Add --forward-journal-config=PATH to specify a configuration file
for the spawned systemd-journal-remote instance.

Extract fork_journal_remote() into fork-notify.{c,h} as a shared
helper used by both nspawn and vmspawn.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Move register_machine() and unregister_machine() from
vmspawn-register.{c,h} into shared machine-register.{c,h} so both
nspawn and vmspawn can use the same implementation.

The unified register_machine() uses varlink first (for richer
features like SSH support and unit allocation) with a D-Bus
RegisterMachineWithNetwork fallback for older machined. The
interface adds a class parameter ("vm" or "container") and
local_ifindex for nspawn's network interface support.

The unified unregister_machine() similarly tries varlink first
(io.systemd.Machine.Unregister) before falling back to D-Bus,
with debug logging when varlink operations are unavailable.

Remove register_machine() and unregister_machine() from
nspawn-register.c (keeping allocate_scope/terminate_scope which
remain nspawn-specific).

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Add DISK_TYPE_SCSI_CD to support attaching disk images as CD-ROM
drives, needed for testing El Torito ISO images built by
systemd-repart.

When --image-disk-type=scsi-cd is specified, the image is attached
with media=cdrom and readonly=on on the drive, using scsi-cd as the
device driver on the SCSI bus. This also works for --extra-drive=
with the scsi-cd: prefix.

The QEMU configuration matches the standard OVMF CD-ROM boot setup:
  -drive if=none,media=cdrom,format=raw,readonly=on
  -device virtio-scsi-pci
  -device scsi-cd

When direct kernel booting with scsi-cd, if the kernel command line
contains "rw", append "ro" to override it since CD-ROMs are
read-only.

Co-developed-by: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread src/nspawn/nspawn.c

/* Apply scope-dependent defaults now that arg_runtime_scope has its final value. -U sets
* arg_userns_mode to _USER_NAMESPACE_MODE_INVALID to defer the resolution to here. */
if (!FLAGS_SET(arg_settings_mask, SETTING_USERNS))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: must-fix: Claude: must-fix: verify_arguments() unconditionally overrides arg_userns_mode when SETTING_USERNS is not in arg_settings_mask. However, load_settings() runs before verify_arguments() and also sets arg_userns_mode under the same mask condition (from .nspawn settings files). This means if a .nspawn file configures PrivateUsers=pick, the value loaded from the file will be silently overwritten here with the scope-based default.

The same issue affects arg_private_network at line 1671.

In the old code, initialize_defaults() set these once (based on geteuid()), then load_settings() could override them, and verify_arguments() didn't touch them — so .nspawn file settings took effect. The new code breaks this layering.

Consider tracking whether load_settings() applied these values (e.g., by having it add to arg_settings_mask, or by checking if the value differs from the initial default) before overwriting.

# define ARCHITECTURE_SUPPORTS_FW_CFG 0
#endif

#if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || defined(__aarch64__)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: ARCHITECTURE_SUPPORTS_CXL includes 32-bit architectures (__i386__, __arm__), but QEMU CXL support requires 64-bit PCIe capabilities and is practically only available on q35 (x86_64) and virt (aarch64) machine types. QEMU will reject cxl=on for 32-bit targets. Consider restricting to __x86_64__ and __aarch64__ only.

Comment thread src/vmspawn/vmspawn.c
}

/* CD-ROMs are read-only, so override any "rw" on the kernel command line. */
if (arg_image_disk_type == DISK_TYPE_SCSI_CD && strv_contains(arg_kernel_cmdline_extra, "rw"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: Appending ro to counteract an earlier rw on the kernel command line relies on Linux "last one wins" semantics. This works for direct kernel boot (-append), but when the kernel cmdline is passed via SMBIOS credential (the else branch in cmdline_add_kernel_cmdline), the rw would still be present unmodified. Consider replacing the rw entry in arg_kernel_cmdline_extra rather than appending a contradictory ro, or validate that arg_linux is set when DISK_TYPE_SCSI_CD is used.

Comment thread src/vmspawn/vmspawn.c
if (!strv_isempty(arg_initrds) && !arg_linux)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Option --initrd= cannot be used without --linux=.");

if (arg_image_disk_type == DISK_TYPE_SCSI_CD) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: Warnings were added for --ephemeral and --discard-disk with scsi-cd, but --grow-image is also silently skipped for scsi-cd (see the arg_image_disk_type != DISK_TYPE_SCSI_CD guard around grow_image() at line ~2894). A similar warning for --grow-image would help users understand why their image was not grown.

if (r < 0)
log_debug_errno(r, "Failed to connect to machined via varlink on %s, falling back to D-Bus: %m", p);
else {
r = varlink_callbo_and_log(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Claude: suggestion: varlink_callbo_and_log() logs at error level on failure. Since this is used as a speculative "try varlink first" call with a D-Bus fallback in unregister_machine(), a varlink failure produces a confusing error-level log message followed by a debug-level "falling back to D-Bus" message. Consider using sd_varlink_call() directly and handling errors at debug level, or using a variant that does not log at error level for this speculative path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant